2,995 research outputs found

    Bandit Models of Human Behavior: Reward Processing in Mental Disorders

    Full text link
    Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for multi-armed bandit problem, which extends the standard Thompson Sampling approach to incorporate reward processing biases associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. We demonstrate empirically that the proposed parametric approach can often outperform the baseline Thompson Sampling on a variety of datasets. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions.Comment: Conference on Artificial General Intelligence, AGI-1

    Concurrent bandits and cognitive radio networks

    Full text link
    We consider the problem of multiple users targeting the arms of a single multi-armed stochastic bandit. The motivation for this problem comes from cognitive radio networks, where selfish users need to coexist without any side communication between them, implicit cooperation or common control. Even the number of users may be unknown and can vary as users join or leave the network. We propose an algorithm that combines an Ï”\epsilon-greedy learning rule with a collision avoidance mechanism. We analyze its regret with respect to the system-wide optimum and show that sub-linear regret can be obtained in this setting. Experiments show dramatic improvement compared to other algorithms for this setting

    ELAN as flexible annotation framework for sound and image processing detectors

    Get PDF
    Annotation of digital recordings in humanities research still is, to a largeextend, a process that is performed manually. This paper describes the firstpattern recognition based software components developed in the AVATecH projectand their integration in the annotation tool ELAN. AVATecH (AdvancingVideo/Audio Technology in Humanities Research) is a project that involves twoMax Planck Institutes (Max Planck Institute for Psycholinguistics, Nijmegen,Max Planck Institute for Social Anthropology, Halle) and two FraunhoferInstitutes (Fraunhofer-Institut fĂŒr Intelligente Analyse- undInformationssysteme IAIS, Sankt Augustin, Fraunhofer Heinrich-Hertz-Institute,Berlin) and that aims to develop and implement audio and video technology forsemi-automatic annotation of heterogeneous media collections as they occur inmultimedia based research. The highly diverse nature of the digital recordingsstored in the archives of both Max Planck Institutes, poses a huge challenge tomost of the existing pattern recognition solutions and is a motivation to makesuch technology available to researchers in the humanities

    Hi-Val: Iterative Learning of Hierarchical Value Functions for Policy Generation

    Get PDF
    Task decomposition is effective in manifold applications where the global complexity of a problem makes planning and decision-making too demanding. This is true, for example, in high-dimensional robotics domains, where (1) unpredictabilities and modeling limitations typically prevent the manual specification of robust behaviors, and (2) learning an action policy is challenging due to the curse of dimensionality. In this work, we borrow the concept of Hierarchical Task Networks (HTNs) to decompose the learning procedure, and we exploit Upper Confidence Tree (UCT) search to introduce HOP, a novel iterative algorithm for hierarchical optimistic planning with learned value functions. To obtain better generalization and generate policies, HOP simultaneously learns and uses action values. These are used to formalize constraints within the search space and to reduce the dimensionality of the problem. We evaluate our algorithm both on a fetching task using a simulated 7-DOF KUKA light weight arm and, on a pick and delivery task with a Pioneer robot

    School adjustment of ethnic minority youth: A qualitative and quantitative research synthesis of family-related risk and resource factors

    Get PDF
    In today’s multicultural societies, the question of how school adjustment (adapting to the role of being a student) can be promoted for students from ethnic minority backgrounds is of high importance. The ecological approach to acculturation research proposes minority students’ school adjustment is shaped by the surrounding context, and it suggests that the microsystem family plays an important role. Specifically, parents’ acculturation, practices, attitudes, and background have been identified as key factors. While there exist systematic reviews of the impact of parental factors more broadly, some of which researched ethnic minorities, a comprehensive literature review of family-related factors that affect ethnic minority youth’s school adjustment is missing. The present study provides a synthesis of qualitative and quantitative empirical research of interest, including 60 qualitative and 46 quantitative studies. Its content analysis portrays in what ways parental acculturation, practices, attitudes and background can support or hamper school adjustment among ethnic minority youth. A subsequent meta-analysis quantifies the strength of the impact of these parental variables on the school adjustment of their children. Our findings show that parental practices have the most crucial impact on the psychological well-being, academic self-esteem and aspirations, behaviour and achievement outcomes of minority youth

    Bandit Online Optimization Over the Permutahedron

    Full text link
    The permutahedron is the convex polytope with vertex set consisting of the vectors (π(1),
,π(n))(\pi(1),\dots, \pi(n)) for all permutations (bijections) π\pi over {1,
,n}\{1,\dots, n\}. We study a bandit game in which, at each step tt, an adversary chooses a hidden weight weight vector sts_t, a player chooses a vertex πt\pi_t of the permutahedron and suffers an observed loss of ∑i=1nπ(i)st(i)\sum_{i=1}^n \pi(i) s_t(i). A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of O(nTlog⁥n)O(n\sqrt{T \log n}) for a time horizon of TT. Unfortunately, CombBand requires at each step an nn-by-nn matrix permanent approximation to within improved accuracy as TT grows, resulting in a total running time that is super linear in TT, making it impractical for large time horizons. We provide an algorithm of regret O(n3/2T)O(n^{3/2}\sqrt{T}) with total time complexity O(n3T)O(n^3T). The ideas are a combination of CombBand and a recent algorithm by Ailon (2013) for online optimization over the permutahedron in the full information setting. The technical core is a bound on the variance of the Plackett-Luce noisy sorting process's "pseudo loss". The bound is obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices generated from rational functions of exponentials of 3 parameters

    Boosting parallel perceptrons for label noise reduction in classification problems

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/11499305_60Proceedings of First International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2005, Las Palmas, Canary Islands, Spain, June 15-18, 2005Boosting combines an ensemble of weak learners to construct a new weighted classifier that is often more accurate than any of its components. The construction of such learners, whose training sets depend on the performance of the previous members of the ensemble, is carried out by successively focusing on those patterns harder to classify. This fact deteriorates boosting’s results when dealing with malicious noise as, for instance, mislabeled training examples. In order to detect and avoid those noisy examples during the learning process, we propose the use of Parallel Perceptrons. Among other things, these novel machines allow to naturally define margins for hidden unit activations. We shall use these margins to detect which patterns may have an incorrect label and also which are safe, in the sense of being well represented in the training sample by many other similar patterns. As candidates for being noisy examples we shall reduce the weights of the former ones, and as a support for the overall detection procedure we shall augment the weights of the latter ones.With partial support of Spain’s CICyT, TIC 01–572, TIN 2004–0767

    On the Prior Sensitivity of Thompson Sampling

    Full text link
    The empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties. One important benefit of the algorithm is that it allows domain knowledge to be conveniently encoded as a prior distribution to balance exploration and exploitation more effectively. While it is generally believed that the algorithm's regret is low (high) when the prior is good (bad), little is known about the exact dependence. In this paper, we fully characterize the algorithm's worst-case dependence of regret on the choice of prior, focusing on a special yet representative case. These results also provide insights into the general sensitivity of the algorithm to the choice of priors. In particular, with pp being the prior probability mass of the true reward-generating model, we prove O(T/p)O(\sqrt{T/p}) and O((1−p)T)O(\sqrt{(1-p)T}) regret upper bounds for the bad- and good-prior cases, respectively, as well as \emph{matching} lower bounds. Our proofs rely on the discovery of a fundamental property of Thompson Sampling and make heavy use of martingale theory, both of which appear novel in the literature, to the best of our knowledge.Comment: Appears in the 27th International Conference on Algorithmic Learning Theory (ALT), 201

    Pilot, Rollout and Monte Carlo Tree Search Methods for Job Shop Scheduling

    Get PDF
    Greedy heuristics may be attuned by looking ahead for each possible choice, in an approach called the rollout or Pilot method. These methods may be seen as meta-heuristics that can enhance (any) heuristic solution, by repetitively modifying a master solution: similarly to what is done in game tree search, better choices are identified using lookahead, based on solutions obtained by repeatedly using a greedy heuristic. This paper first illustrates how the Pilot method improves upon some simple well known dispatch heuristics for the job-shop scheduling problem. The Pilot method is then shown to be a special case of the more recent Monte Carlo Tree Search (MCTS) methods: Unlike the Pilot method, MCTS methods use random completion of partial solutions to identify promising branches of the tree. The Pilot method and a simple version of MCTS, using the Δ\varepsilon-greedy exploration paradigms, are then compared within the same framework, consisting of 300 scheduling problems of varying sizes with fixed-budget of rollouts. Results demonstrate that MCTS reaches better or same results as the Pilot methods in this context.Comment: Learning and Intelligent OptimizatioN (LION'6) 7219 (2012

    Braess's Paradox in Wireless Networks: The Danger of Improved Technology

    Full text link
    When comparing new wireless technologies, it is common to consider the effect that they have on the capacity of the network (defined as the maximum number of simultaneously satisfiable links). For example, it has been shown that giving receivers the ability to do interference cancellation, or allowing transmitters to use power control, never decreases the capacity and can in certain cases increase it by Ω(log⁥(Δ⋅Pmax⁥))\Omega(\log (\Delta \cdot P_{\max})), where Δ\Delta is the ratio of the longest link length to the smallest transmitter-receiver distance and Pmax⁥P_{\max} is the maximum transmission power. But there is no reason to expect the optimal capacity to be realized in practice, particularly since maximizing the capacity is known to be NP-hard. In reality, we would expect links to behave as self-interested agents, and thus when introducing a new technology it makes more sense to compare the values reached at game-theoretic equilibria than the optimum values. In this paper we initiate this line of work by comparing various notions of equilibria (particularly Nash equilibria and no-regret behavior) when using a supposedly "better" technology. We show a version of Braess's Paradox for all of them: in certain networks, upgrading technology can actually make the equilibria \emph{worse}, despite an increase in the capacity. We construct instances where this decrease is a constant factor for power control, interference cancellation, and improvements in the SINR threshold (ÎČ\beta), and is Ω(log⁡Δ)\Omega(\log \Delta) when power control is combined with interference cancellation. However, we show that these examples are basically tight: the decrease is at most O(1) for power control, interference cancellation, and improved ÎČ\beta, and is at most O(log⁡Δ)O(\log \Delta) when power control is combined with interference cancellation
    • 

    corecore